Search for: All records

Creators/Authors contains: "Rokem, Ariel"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Towards an open-source model for data and metadata standards

https://doi.org/10.31219/osf.io/br6u2

Rokem, Ariel; Mandava, Vani; Cristea, Nicoleta; Tambay, Anshul; Connolly, Andrew J (October 2024, Open Science Framework)

Progress in machine learning and artificial intelligence promises to advance research and understanding across a wide range of fields and activities. In tandem, increased awareness of the importance of open data for reproducibility and scientific transparency is making inroads in fields that have not traditionally produced large publicly available datasets. Data sharing requirements from publishers and funders, as well as from other stakeholders, have also created pressure to make datasets with research and/or public interest value available through digital repositories. However, to make the best use of existing data, and facilitate the creation of useful future datasets, robust, interoperable and usable standards need to evolve and adapt over time. The open-source development model provides significant potential benefits to the process of standard creation and adaptation. In particular, data and meta-data standards can use long-standing technical and socio-technical processes that have been key to managing the development of software, and which allow incorporating broad community input into the formulation of these standards. On the other hand, open-source models carry unique risks that need to be considered. This report surveys existing open-source standards development, addressing these benefits and risks. It outlines recommendations for standards developers, funders and other stakeholders on the path to robust, interoperable and usable open-source data and metadata standards.
more » « less
Full Text Available
Tractometry of the Human Connectome Project: resources and insights

https://doi.org/10.3389/fnins.2024.1389680

Kruper, John; Hagen, McKenzie P; Rheault, François; Crane, Isaac; Gilmore, Asa; Narayan, Manjari; Motwani, Keshav; Lila, Eardi; Rorden, Chris; Yeatman, Jason D; et al (June 2024, Frontiers in Neuroscience)

The Human Connectome Project (HCP) has become a keystone dataset in human neuroscience, with a plethora of important applications in advancing brain imaging methods and an understanding of the human brain. We focused on tractometry of HCP diffusion-weighted MRI (dMRI) data. We used an open-source software library (pyAFQ;https://yeatmanlab.github.io/pyAFQ) to perform probabilistic tractography and delineate the major white matter pathways in the HCP subjects that have a complete dMRI acquisition (n = 1,041). We used diffusion kurtosis imaging (DKI) to model white matter microstructure in each voxel of the white matter, and extracted tract profiles of DKI-derived tissue properties along the length of the tracts. We explored the empirical properties of the data: first, we assessed the heritability of DKI tissue properties using the known genetic linkage of the large number of twin pairs sampled in HCP. Second, we tested the ability of tractometry to serve as the basis for predictive models of individual characteristics (e.g., age, crystallized/fluid intelligence, reading ability, etc.), compared to local connectome features. To facilitate the exploration of the dataset we created a new web-based visualization tool and use this tool to visualize the data in the HCP tractometry dataset. Finally, we used the HCP dataset as a test-bed for a new technological innovation: the TRX file-format for representation of dMRI-based streamlines. We released the processing outputs and tract profiles as a publicly available data resource through the AWS Open Data program's Open Neurodata repository. We found heritability as high as 0.9 for DKI-based metrics in some brain pathways. We also found that tractometry extracts as much useful information about individual differences as the local connectome method. We released a new web-based visualization tool for tractometry --- “Tractoscope” (https://nrdg.github.io/tractoscope). We found that the TRX files require considerably less disk space - a crucial attribute for large datasets like HCP. In addition, TRX incorporates a specification for grouping streamlines, further simplifying tractometry analysis.
more » « less
Full Text Available
forestexplorR: an R package for the exploration and analysis of stem‐mapped forest stand data

https://doi.org/10.1111/ecog.06223

Graham, Stuart I.; Rokem, Ariel; Hille Ris Lambers, Janneke (October 2022, Ecography)

Stem‐mapped forest stands offer important opportunities for investigating the fine‐scale spatial processes occurring in forest ecosystems. These stands are areas of the forest where the precise locations and repeated size measurements of each tree are recorded, thereby enabling the calculation of spatially‐explicit metrics of individual growth rates and of the entire tree community. The most common use of these datasets is to investigate the drivers of variation in forest processes by modeling tree growth rate or mortality as a function of these neighborhood metrics. However, neighborhood metrics could also serve as important covariates of many other spatially variable forest processes, including seedling recruitment, herbivory and soil microbial community composition. Widespread use of stem‐mapped forest stand datasets is currently hampered by the lack of standardized, efficient and easy‐to‐use tools to calculate tree dynamics (e.g. growth, mortality) and the neighborhood metrics that impact them. We present the forestexplorR package that facilitates the munging, exploration, visualization and analysis of stem‐mapped forest stands. By providing flexible, user‐friendly functions that calculate neighborhood metrics and implement a recently‐developed rapid‐fitting tree growth and mortality model, forestexplorR broadens the accessibility of stem‐mapped forest stand data. We demonstrate the functionality of forestexplorR by using it to investigate how the species identity of neighboring trees influences the growth rates of three common tree species in Mt Rainier National Park, WA, USA. forestexplorR is designed to facilitate researchers to incorporate spatially‐explicit descriptions of tree communities in their studies and we expect this increased diversity of contributors to develop exciting new ways of using stem‐mapped forest stand data.
more » « less
Full Text Available
Fractional ridge regression: a fast, interpretable reparameterization of ridge regression

https://doi.org/10.1093/gigascience/giaa133

Rokem, Ariel; Kay, Kendrick (November 2020, GigaScience)
null (Ed.)
Abstract Background Ridge regression is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using ridge regression is the need to set a hyperparameter (α) that controls the amount of regularization. Cross-validation is typically used to select the best α from a set of candidates. However, efficient and appropriate selection of α can be challenging. This becomes prohibitive when large amounts of data are analyzed. Because the selected α depends on the scale of the data and correlations across predictors, it is also not straightforwardly interpretable. Results The present work addresses these challenges through a novel approach to ridge regression. We propose to reparameterize ridge regression in terms of the ratio γ between the L2-norms of the regularized and unregularized coefficients. We provide an algorithm that efficiently implements this approach, called fractional ridge regression, as well as open-source software implementations in Python and matlab (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems. In brain imaging data, we demonstrate that this approach delivers results that are straightforward to interpret and compare across models and datasets. Conclusion Fractional ridge regression has several benefits: the solutions obtained for different γ are guaranteed to vary, guarding against wasted calculations; and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. These properties make fractional ridge regression particularly suitable for analysis of large complex datasets.
more » « less
Full Text Available
Regularized Regression: A New Tool for Investigating and Predicting Tree Growth

https://doi.org/10.3390/f12091283

Graham, Stuart I.; Rokem, Ariel; Fortunel, Claire; Kraft, Nathan J.; Hille Ris Lambers, Janneke (September 2021, Forests)

Neighborhood models have allowed us to test many hypotheses regarding the drivers of variation in tree growth, but require considerable computation due to the many empirically supported non-linear relationships they include. Regularized regression represents a far more efficient neighborhood modeling method, but it is unclear whether such an ecologically unrealistic model can provide accurate insights on tree growth. Rapid computation is becoming increasingly important as ecological datasets grow in size, and may be essential when using neighborhood models to predict tree growth beyond sample plots or into the future. We built a novel regularized regression model of tree growth and investigated whether it reached the same conclusions as a commonly used neighborhood model, regarding hypotheses of how tree growth is influenced by the species identity of neighboring trees. We also evaluated the ability of both models to interpolate the growth of trees not included in the model fitting dataset. Our regularized regression model replicated most of the classical model’s inferences in a fraction of the time without using high-performance computing resources. We found that both methods could interpolate out-of-sample tree growth, but the method making the most accurate predictions varied among focal species. Regularized regression is particularly efficient for comparing hypotheses because it automates the process of model selection and can handle correlated explanatory variables. This feature means that regularized regression could also be used to select among potential explanatory variables (e.g., climate variables) and thereby streamline the development of a classical neighborhood model. Both regularized regression and classical methods can interpolate out-of-sample tree growth, but future research must determine whether predictions can be extrapolated to trees experiencing novel conditions. Overall, we conclude that regularized regression methods can complement classical methods in the investigation of tree growth drivers and represent a valuable tool for advancing this field toward prediction.
more » « less
Full Text Available
Numerical uncertainty in analytical pipelines lead to impactful variability in brain networks

https://doi.org/10.1371/journal.pone.0250755

Kiar, Gregory; Chatelain, Yohan; de_Oliveira_Castro, Pablo; Petit, Eric; Rokem, Ariel; Varoquaux, Gaël; Misic, Bratislav; Evans, Alan C; Glatard, Tristan (November 2021, PLOS ONE)
Dimitriadis, Stavros I (Ed.)
The analysis of brain-imaging data requires complex processing pipelines to support findings on brain function or pathologies. Recent work has shown that variability in analytical decisions, small amounts of noise, or computational environments can lead to substantial differences in the results, endangering the trust in conclusions. We explored the instability of results by instrumenting a structural connectome estimation pipeline with Monte Carlo Arithmetic to introduce random noise throughout. We evaluated the reliability of the connectomes, the robustness of their features, and the eventual impact on analysis. The stability of results was found to range from perfectly stable (i.e. all digits of data significant) to highly unstable (i.e. 0 − 1 significant digits). This paper highlights the potential of leveraging induced variance in estimates of brain connectivity to reduce the bias in networks without compromising reliability, alongside increasing the robustness and potential upper-bound of their applications in the classification of individual differences. We demonstrate that stability evaluations are necessary for understanding error inherent to brain imaging experiments, and how numerical analysis can be applied to typical analytical workflows both in brain imaging and other domains of computational sciences, as the techniques used were data and context agnostic and globally relevant. Overall, while the extreme variability in results due to analytical instabilities could severely hamper our understanding of brain organization, it also affords us the opportunity to increase the robustness of findings.
more » « less
Full Text Available
Bundle analytics, a computational framework for investigating the shapes and profiles of brain pathways across populations

https://doi.org/10.1038/s41598-020-74054-4

Chandio, Bramsh Qamar; Risacher, Shannon Leigh; Pestilli, Franco; Bullock, Daniel; Yeh, Fang-Cheng; Koudoro, Serge; Rokem, Ariel; Harezlak, Jaroslaw; Garyfallidis, Eleftherios (December 2020, Scientific Reports)
null (Ed.)
Abstract Tractography has created new horizons for researchers to study brain connectivity in vivo. However, tractography is an advanced and challenging method that has not been used so far for medical data analysis at a large scale in comparison to other traditional brain imaging methods. This work allows tractography to be used for large scale and high-quality medical analytics. BUndle ANalytics (BUAN) is a fast, robust, and flexible computational framework for real-world tractometric studies. BUAN combines tractography and anatomical information to analyze the challenging datasets and identifies significant group differences in specific locations of the white matter bundles. Additionally, BUAN takes the shape of the bundles into consideration for the analysis. BUAN compares the shapes of the bundles using a metric called bundle adjacency which calculates shape similarity between two given bundles. BUAN builds networks of bundle shape similarities that can be paramount for automating quality control. BUAN is freely available in DIPY. Results are presented using publicly available Parkinson’s Progression Markers Initiative data.
more » « less
Full Text Available
Comparative evaluation of big-data systems on scientific image analytics workloads

https://doi.org/10.14778/3137628.3137634

Mehta, Parmita; AlSayyad, Yusra; Dorkenwald, Sven; Zhao, Dongfang; Kaftan, Tomer; Cheung, Alvin; Balazinska, Magdalena; Rokem, Ariel; Connolly, Andrew; Vanderplas, Jacob (August 2017, Proceedings of the VLDB Endowment)

Full Text Available
The visual white matter: The application of diffusion MRI and fiber tractography to vision science

https://doi.org/10.1167/17.2.4

Rokem, Ariel; Takemura, Hiromasa; Bock, Andrew S.; Scherf, K. Suzanne; Behrmann, Marlene; Wandell, Brian A.; Fine, Ione; Bridge, Holly; Pestilli, Franco (February 2017, Journal of Vision)

Full Text Available
Journal of Open Source Software (JOSS): design and first-year review

https://doi.org/10.7717/peerj-cs.147

Smith, Arfon M.; Niemeyer, Kyle E.; Katz, Daniel S.; Barba, Lorena A.; Githinji, George; Gymrek, Melissa; Huff, Kathryn D.; Madan, Christopher R.; Cabunoc Mayes, Abigail; Moerman, Kevin M.; et al (January 2018, PeerJ Computer Science)
null (Ed.)
This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit system of science, JOSS addresses the dearth of rewards for key contributions to science made in the form of software. JOSS publishes articles that encapsulate scholarship contained in the software itself, and its rigorous peer review targets the software components: functionality, documentation, tests, continuous integration, and the license. A JOSS article contains an abstract describing the purpose and functionality of the software, references, and a link to the software archive. The article is the entry point of a JOSS submission, which encompasses the full set of software artifacts. Submission and review proceed in the open, on GitHub. Editors, reviewers, and authors work collaboratively and openly. Unlike other journals, JOSS does not reject articles requiring major revision; while not yet accepted, articles remain visible and under review until the authors make adequate changes (or withdraw, if unable to meet requirements). Once an article is accepted, JOSS gives it a digital object identifier (DOI), deposits its metadata in Crossref, and the article can begin collecting citations on indexers like Google Scholar and other services. Authors retain copyright of their JOSS article, releasing it under a Creative Commons Attribution 4.0 International License. In its first year, starting in May 2016, JOSS published 111 articles, with more than 40 additional articles under review. JOSS is a sponsored project of the nonprofit organization NumFOCUS and is an affiliate of the Open Source Initiative (OSI).
more » « less
Full Text Available

« Prev Next »